| Latest Geography NCERT Notes, Solutions and Extra Q & A (Class 8th to 12th) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8th | 9th | 10th | 11th | 12th | |||||||||||||||
Chapter 2 Data Processing
Measures Of Central Tendency
After collecting and organising data, the next step in data processing is to analyse it using statistical techniques. These techniques help in extracting meaningful insights and summarising the data. Measures of central tendency are a set of statistical methods used for this purpose.
Definition and Purpose
Measures of central tendency aim to identify a single, representative value that best describes the center or typical value of a dataset. When dealing with variables that vary (like rainfall, elevation, population density, test scores), a single number that encapsulates the essence of all observations is often required to understand the dataset efficiently. This representative value usually lies somewhere in the middle of the data distribution, where observations tend to cluster.
These measures are also known as statistical averages because they provide a summary value for the entire data set. They offer a way to represent the entire collection of data points with just one number, making the dataset more comprehensible and easier to compare with others.
Types Of Measures
There are several common measures of central tendency:
- Mean: The arithmetic average, calculated by summing all values and dividing by the number of observations.
- Median: The middle value in a dataset that has been arranged in order. It divides the data into two equal halves.
- Mode: The value that appears most frequently in the dataset.
Each of these measures uses a different method to determine the 'center' of a distribution and is suitable for different types of data or analytical purposes.
Mean
The Mean is the most commonly used measure of central tendency. It represents the simple arithmetic average of a set of values. The method for calculating the mean differs depending on whether the data is ungrouped (individual values) or grouped (data sorted into classes or intervals).
Computing Mean From Ungrouped Data
Ungrouped data consists of individual observations that have not been sorted into frequency classes.
Direct Method
The direct method for calculating the mean of ungrouped data involves summing all the individual values and dividing the sum by the total number of observations.
The formula for the direct method is:
$ \bar{X} = \frac{\sum x}{N} $
Where:
- $\bar{X}$ = Mean
- $\sum x$ = The sum of all individual observations (x)
- $N$ = The total number of observations
Example 2.1: Calculate the mean rainfall for Malwa Plateau districts from the data given below:
| Districts in Malwa Plateau | Normal Rainfall in mm |
|---|---|
| Indore | 979 |
| Dewas | 1083 |
| Dhar | 833 |
| Ratlam | 896 |
| Ujjain | 891 |
| Mandsaur | 825 |
| Shajapur | 977 |
Answer:
Here, the individual observations (x) are the rainfall values for each district, and the number of observations (N) is the number of districts, which is 7.
Sum of rainfall values ($\sum x$) = $979 + 1083 + 833 + 896 + 891 + 825 + 977 = 6484$ mm
$ N = 7 $
$ \bar{X} = \frac{\sum x}{N} = \frac{6484}{7} = 926.29 \text{ mm} $
The mean rainfall for the Malwa Plateau in this example is 926.29 mm.
Indirect Method
The indirect method is often used for ungrouped data when dealing with a large number of observations or large values, as it simplifies calculations. It involves choosing an 'assumed mean' (A) and calculating the deviations (d) of each observation from this assumed mean ($d = x - A$). The mean is then calculated based on the assumed mean and the average of these deviations.
The formula for the indirect method is:
$ \bar{X} = A + \frac{\sum d}{N} $
Where:
- $\bar{X}$ = Mean
- $A$ = Assumed Mean (a constant value subtracted from each observation)
- $\sum d$ = The sum of the deviations ($d$) of all observations from the assumed mean
- $N$ = The total number of observations
Example 2.1 (Continued): Calculate the mean rainfall using the indirect method, taking 800 as the assumed mean.
| Districts in Malwa Plateau | Normal Rainfall (x) in mm | Deviation (d = x - 800) |
|---|---|---|
| Indore | 979 | 179 |
| Dewas | 1083 | 283 |
| Dhar | 833 | 33 |
| Ratlam | 896 | 96 |
| Ujjain | 891 | 91 |
| Mandsaur | 825 | 25 |
| Shajapur | 977 | 177 |
| Sum ($\sum$) | 6484 | 884 |
Answer:
$ A = 800 $
Sum of deviations ($\sum d$) = $179 + 283 + 33 + 96 + 91 + 25 + 177 = 884$
$ N = 7 $
$ \bar{X} = A + \frac{\sum d}{N} = 800 + \frac{884}{7} = 800 + 126.29 = 926.29 \text{ mm} $
As expected, the mean calculated by the indirect method is the same as that calculated by the direct method.
Computing Mean From Grouped Data
When data is presented as a frequency distribution (grouped into classes), the individual values within each class are not known. Instead, they are represented by the midpoint of their respective class interval.
Direct Method
In the direct method for grouped data, the midpoint of each class is multiplied by its frequency. These products (fx) are summed up, and the total sum is divided by the total number of observations (N, which is the sum of all frequencies).
The formula is:
$ \bar{X} = \frac{\sum fx}{N} $
Where:
- $\bar{X}$ = Mean
- $f$ = Frequency of each class
- $x$ = Midpoint of each class interval
- $N$ = Total number of observations ($\sum f$)
Example 2.2: Compute the average wage rate of factory workers using the data given in Table 2.2:
| Wage Rate (Rs./day) | Number of workers (f) |
|---|---|
| 50 - 70 | 10 |
| 70 - 90 | 20 |
| 90 - 110 | 25 |
| 110 - 130 | 35 |
| 130 - 150 | 9 |
| Total = 99 |
Answer:
To calculate the mean, we first need to find the midpoint (x) of each class and then the product (fx) of the midpoint and frequency for each class.
| Classes (Wage Rate) | Frequency (f) | Midpoints (x) | fx |
|---|---|---|---|
| 50-70 | 10 | 60 | 600 |
| 70-90 | 20 | 80 | 1600 |
| 90-110 | 25 | 100 | 2500 |
| 110-130 | 35 | 120 | 4200 |
| 130-150 | 9 | 140 | 1260 |
| Sum ($\sum$) | N = 99 | $\sum fx = 10160$ |
$ \bar{X} = \frac{\sum fx}{N} = \frac{10160}{99} = 102.63 \text{ Rs./day (approx)} $
The average wage rate of the factory workers is approximately $\textsf{₹}102.63$ per day.
Indirect Method
The indirect method for grouped data also uses an assumed mean to simplify calculations, especially useful when midpoints or frequencies are large numbers. An assumed mean (A) is selected from the midpoint of one of the classes (often the class near the center). Deviations (d) are calculated for each class midpoint from the assumed mean ($d = x - A$). Alternatively, if class intervals are equal, deviations can be coded (u) by dividing 'd' by the class interval width (i) ($u = d/i$).
The formula for the indirect method using deviations (d) is:
$ \bar{X} = A \pm \frac{\sum fd}{N} $
The formula using coded deviations (u) is:
$ \bar{X} = A \pm \frac{\sum fu}{N} \times i $
Where:
- $\bar{X}$ = Mean
- $A$ = Assumed Mean (midpoint of the assumed mean class)
- $f$ = Frequency of each class
- $d$ = Deviation of each class midpoint from the assumed mean ($x - A$)
- $u$ = Coded deviation ($d/i$)
- $N$ = Total number of observations ($\sum f$)
- $i$ = Class interval width (used only if intervals are equal and coded deviations are used)
Example 2.2 (Continued): Compute the average wage rate using the indirect method, taking the midpoint of the 90-110 class (which is 100) as the assumed mean. Also, use coded deviations as the class interval width is 20.
| Classes (Wage Rate) | Frequency (f) | Midpoints (x) | Deviation (d = x - 100) | fd | Coded Deviation (u = d/20) | fu |
|---|---|---|---|---|---|---|
| 50-70 | 10 | 60 | -40 | -400 | -2 | -20 |
| 70-90 | 20 | 80 | -20 | -400 | -1 | -20 |
| 90-110 | 25 | 100 | 0 | 0 | 0 | 0 |
| 110-130 | 35 | 120 | 20 | 700 | 1 | 35 |
| 130-150 | 9 | 140 | 40 | 360 | 2 | 18 |
| Sum ($\sum$) | N = 99 | $\sum fd = 260$ | $\sum fu = 13$ |
Answer:
Using the formula with deviations (d):
$ A = 100 $
$ \sum fd = 260 $
$ N = 99 $
$ \bar{X} = A + \frac{\sum fd}{N} = 100 + \frac{260}{99} = 100 + 2.63 = 102.63 \text{ Rs./day (approx)} $
Using the formula with coded deviations (u):
$ A = 100 $
$ \sum fu = 13 $
$ N = 99 $
$ i = 20 $
$ \bar{X} = A + \frac{\sum fu}{N} \times i = 100 + \frac{13}{99} \times 20 = 100 + 0.1313 \times 20 = 100 + 2.63 = 102.63 \text{ Rs./day (approx)} $
Both indirect methods yield the same mean as the direct method.
Median
The Median (M) is a positional average. It represents the value of the middle observation in a dataset that has been arranged in ascending or descending order. The median divides the data into two equal halves: 50% of the observations are below the median, and 50% are above it.
The median is independent of the actual values of extreme observations, making it a suitable measure when the data is skewed or contains outliers.
Computing Median For Ungrouped Data
For ungrouped data, the steps to calculate the median are:
- Arrange the data in either ascending or descending order.
- Locate the position of the median using the formula: Value of $ (\frac{N+1}{2})^{\text{th}} $ item.
- If N is odd, the median is the value of the item at the calculated position.
- If N is even, the median is the average of the values of the two middle items (at positions $ N/2 $ and $ (N/2) + 1 $).
Example 2.3: Calculate median height of mountain peaks in parts of the Himalayas using the following data (in meters):
8,126; 8,611; 7,817; 8,172; 8,076; 8,848; 8,598
Answer:
There are 7 observations (N=7), which is an odd number.
1. Arrange the data in ascending order:
7,817; 8,076; 8,126; 8,172; 8,598; 8,611; 8,848
2. Locate the median position: $ (\frac{N+1}{2})^{\text{th}} \text{ item} = (\frac{7+1}{2})^{\text{th}} \text{ item} = (\frac{8}{2})^{\text{th}} \text{ item} = 4^{\text{th}} \text{ item} $
3. The 4th item in the arranged series is 8,172.
$ M = 8,172 \text{ m} $
If there were an even number of observations, say 8, you would average the values at the $ 8/2 = 4^{\text{th}} $ and $ (8/2)+1 = 5^{\text{th}} $ positions.
Computing Median For Grouped Data
For grouped data (frequency distribution), the median is found by first locating the median class and then using a formula to interpolate the median value within that class. The median class is the class interval where the cumulative frequency first exceeds or equals $ N/2 $.
The formula for calculating the median from grouped data is:
$ M = l + \frac{\frac{N}{2} - c}{f} \times i $
Where:
- $M$ = Median for grouped data
- $l$ = Lower limit of the median class
- $N$ = Total number of observations ($\sum f$)
- $c$ = Cumulative frequency of the class preceding the median class
- $f$ = Frequency of the median class
- $i$ = Class interval width
Example 2.4: Calculate the median for the following distribution:
| Class | f |
|---|---|
| 50-60 | 3 |
| 60-70 | 7 |
| 70-80 | 11 |
| 80-90 | 16 |
| 90-100 | 8 |
| 100-110 | 5 |
| Total | N=50 |
Answer:
1. Create a cumulative frequency (F) column.
| Class | Frequency (f) | Cumulative Frequency (F) |
|---|---|---|
| 50-60 | 3 | 3 |
| 60-70 | 7 | $3 + 7 = 10$ |
| 70-80 | 11 | $10 + 11 = 21$ (c) |
| 80-90 | 16 (f) | $21 + 16 = 37$ |
| 90-100 | 8 | $37 + 8 = 45$ |
| 100-110 | 5 | $45 + 5 = 50$ (N) |
| N=50 |
2. Calculate $ N/2 $: $ N/2 = 50/2 = 25 $
3. Find the median class: Look in the cumulative frequency column for the value that is just greater than or equal to 25. This value is 37, which corresponds to the class 80-90. So, the median class is 80-90.
4. Identify the values for the formula:
- $l$ (Lower limit of median class) = 80
- $N$ (Total frequency) = 50
- $c$ (Cumulative frequency of pre-median class, i.e., the class before 80-90) = 21
- $f$ (Frequency of the median class 80-90) = 16
- $i$ (Class interval width) = $90 - 80 = 10$
5. Substitute the values into the median formula:
$ M = l + \frac{\frac{N}{2} - c}{f} \times i = 80 + \frac{25 - 21}{16} \times 10 $
$ M = 80 + \frac{4}{16} \times 10 = 80 + 0.25 \times 10 = 80 + 2.5 $
$ M = 82.5 $
The median wage rate is $\textsf{₹}82.5$ per day.
Mode
The Mode (Z or Mo) is defined as the value that appears most frequently in a dataset. It is the observation with the highest frequency of occurrence. Compared to the mean and median, the mode is generally less used in statistical analysis, but it is useful for identifying the most typical or common value in a distribution.
A dataset can have one mode (unimodal), two modes (bimodal), more than two modes (multimodal), or no mode at all if no value is repeated.
Computing Mode For Ungrouped Data
To compute the mode for ungrouped data, simply identify the value that occurs with the highest frequency. Arranging the data in ascending or descending order can help in easily counting the frequency of each distinct value.
Example 2.5: Calculate mode for the following test scores in geography for ten students:
61, 10, 88, 37, 61, 72, 55, 61, 46, 22
Answer:
List the unique scores and count their frequencies:
- 10: occurs once
- 22: occurs once
- 37: occurs once
- 46: occurs once
- 55: occurs once
- 61: occurs three times
- 72: occurs once
- 88: occurs once
The score 61 occurs most frequently (3 times). Therefore, the mode is 61.
$ \text{Mode} = 61 $
This distribution is unimodal.
Example 2.6: Calculate the mode using a different sample of ten other students, who scored:
82, 11, 57, 82, 08, 11, 82, 95, 41, 11.
Answer:
List the unique scores and count their frequencies:
- 08: occurs once
- 11: occurs three times
- 41: occurs once
- 57: occurs once
- 82: occurs three times
- 95: occurs once
Both scores 11 and 82 occur with the highest frequency (3 times). Therefore, this dataset has two modes: 11 and 82.
$ \text{Modes} = 11, 82 $
This distribution is bimodal.
Comparison Of Mean, Median And Mode
Comparing the mean, median, and mode helps in understanding the characteristics of a data distribution, especially its shape (symmetry or skewness).
Normal Distribution
In a normal distribution (often represented graphically as a symmetrical, bell-shaped curve), the mean, median, and mode all coincide and are equal to the same value. The highest frequency occurs exactly at the center of the distribution, where the mean, median, and mode are located. In a normal distribution, data is symmetrically distributed around the center, with frequencies gradually decreasing as you move towards the extreme values.
Skewed Distributions
If a dataset is skewed (asymmetrical), the mean, median, and mode will generally not be equal. The relative positions of these measures indicate the direction of the skew.
- Positive Skew (Skewed to the right): The tail of the distribution extends towards higher values. The mode is the lowest value, followed by the median, and the mean is the highest value (Mode < Median < Mean). The mean is pulled towards the higher values by the longer tail.
- Negative Skew (Skewed to the left): The tail of the distribution extends towards lower values. The mean is the lowest value, followed by the median, and the mode is the highest value (Mean < Median < Mode). The mean is pulled towards the lower values by the longer tail.
The relationship between these measures provides insights into the shape of the distribution. The mean is sensitive to extreme values (outliers), while the median is not. The mode is useful for categorical data or identifying the most common category or value.